Tuning for software analytics: Is it really necessary?

نویسندگان

  • Wei Fu
  • Tim Menzies
  • Xipeng Shen
چکیده

Context: Data miners have been widely used in software engineering to, say, generate defect predictors from static code measures. Such static code defect predictors perform well compared to manual methods, and they are easy to use and useful to use. But one of the “black arts” of data mining is setting the tunings that control the miner. Objective: We seek simple, automatic, and very effective method for finding those tunings. Method: For each experiment with different data sets (from open source JAVA systems), we ran differential evolution as an optimizer to explore the tuning space (as a first step) then tested the tunings using hold-out data. Results: Contrary to our prior expectations, we found these tunings were remarkably simple: it only required tens, not thousands, of attempts to obtain very good results. For example, when learning software defect predictors, this method can quickly find tunings that alter detection precision from 0% to 60%. Conclusion: Since (1) the improvements are so large, and (2) the tuning is so simple, we need to change standard methods in software analytics. At least for defect prediction, it is no longer enough to just run a data miner and present the result without conducting a tuning optimization study. The implication for other kinds of analytics is now an open and pressing issue.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Is"Better Data"Better than"Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tun...

متن کامل

P-V-L Deep: A Big Data Analytics Solution for Now-casting in Monetary Policy

The development of new technologies has confronted the entire domain of science and industry with issues of big data's scalability as well as its integration with the purpose of forecasting analytics in its life cycle. In predictive analytics, the forecast of near-future and recent past - or in other words, the now-casting - is the continuous study of real-time events and constantly updated whe...

متن کامل

A Quality Model for Actionable Analytics in Rapid Software Development

Background: Accessing relevant data on the software product, process, and usage as well as integrating and analysing it is crucial to get reliable and timely actionable insights for continuously managing software quality in Rapid Software Development (RSD). In this context, several software analytics tools have been developed in recent years. However, there is a lack of explainable software ana...

متن کامل

A Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters

Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...

متن کامل

An ANOVA Based Analytical Dynamic Matrix Controller Tuning Procedure for FOPDT Models

Dynamic Matrix Control (DMC) is a widely used model predictive controller (MPC) in industrial plants. The successful implementation of DMC in practical applications requires a proper tuning of the controller. The available tuning procedures are mainly based on experience and empirical results. This paper develops an analytical tool for DMC tuning. It is based on the application of Analysis of V...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Information & Software Technology

دوره 76  شماره 

صفحات  -

تاریخ انتشار 2016